Focused Web Crawling Using Decay Concept and Genetic Programming

نویسندگان

  • Mahdi Bazarganigilani
  • Ali Syed
چکیده

The ongoing rapid growth of web information is a theme of research in many papers. In this paper, we introduce a new optimized method for web crawling. Using genetic programming enhances the accuracy of simialrity measurement. This measurement applies to different parts of the web pages including the title and the body. Consequently, the crawler uses such optimized similarity measurement to traverse the pages .To enhance the accuracy of crawling, we use the decay concept to limit the crawler to the effective web pages in accordance to search criteria. The decay measurements give every page a score according to the search criteria. It decreases while traversing in more depth. This value could be revised according to the similarity of the page to the search criteria. In such case, we use three kinds of measurement to set the thresholds. The results show using Genetic programming along the dynamic decay thresholds leads to the best accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ranking Hyperlinks Approach for Focused Web Crawler

The World Wide Web is growing rapidly and many search engines do not cover all the visible pages. Therefore, a more effective crawling method is required to collect more accurate data. In this paper, we introduce an effective focused web crawler containing smart methods. In text analysis, similarity measurement applies to different parts of the Web pages including title, body, anchor text and U...

متن کامل

Design and Implementation of Focused Web Crawler Using Genetic Algorithm: An Approach to Web Mining

The speed at which World -Wide -Web (WWW) is growing round the clock spreds its arms from smaler collections of web pages to a massive hub of web information which gradually increases the complexity of crawling process.search engines handles enourmous quaries from different part of the univers to retrieve most of the relevant results in response to answer the user queries, and it is solely depe...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

A Novel Hybrid Focused Crawling Algorithm to Build Domain-Specific Collections

The Web, containing a large amount of useful information and resources, is expanding rapidly. Collecting domain-specific documents/information from the Web is one of the most important methods to build digital libraries for the scientific community. Focused Crawlers can selectively retrieve Web documents relevant to a specific domain to build collections for domain-specific search engines or di...

متن کامل

Accurate and Efficient Crawling for Relevant Websites

Focused web crawlers have recently emerged as an alternative to the well-established web search engines. While the well-known focused crawlers retrieve relevant webpages, there are various applications which target whole websites instead of single webpages. For example, companies are represented by websites, not by individual webpages. To answer queries targeted at websites, web directories are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011